Web Structure in 2005

نویسندگان

  • Yu Hirate
  • Shin Kato
  • Hayato Yamana
چکیده

The estimated number of static web pages in Oct 2005 was over 20.3 billion, which was determined by multiplying the average number of pages per web server based on the results of three previous studies, 200 pages, by the estimated number of web servers on the Internet, 101.4 million. However, based on the analysis of 8.5 billion web pages that we crawled by Oct. 2005, we estimate the total number of web pages to be 53.7 billion. This is because the number of dynamic web pages has increased rapidly in recent years. We also analyzed the web structure using 3 billion of the 8.5 billion web pages that we have crawled. Our results indicate that the size of the ”CORE,” the central component of the bow tie structure, has increased in recent years, especially in the Chinese and Japanese web.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Assessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools

Introduction: Study evaluated the internal structure of Ellis information seeking model in the student community with the aim of presenting the Persian norm. Methods: This is a descriptive-analytical study conducted by cross-sectional survey method in the second semester of the academic year 1399-1400. Population comprise of 280 graduate students at Ahvaz Jundishapur University of Medical Scien...

متن کامل

Web Usage Mining Using Rough Agglomerative Clustering

Tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining, web usage mining and web structure mining. Web usage mining is a techniques to disc...

متن کامل

Flexible Web Services Discovery and Composition using SATPlan and A* Algorithms

Agents with web services based interfaces have service description encoded in machine-understandable formats so that they can easily interact with other agents. Therefore, locating right agents and combining them to form more complex services becomes increasingly an important task on the Web. However, when there are a large number of web services based agents available, it is non-trivial to qui...

متن کامل

Intellectual Structure of Knowledge in Information Behavior: A Co-Word Analysis

Background and Aim: The intellectual structure of knowledge and its research front can be identified by co-word analysis. This research attempts to reveal the intellectual structure of knowledge in information behavior inquiries, via co-word, network analysis, and science visualization tools. Methods: Bibliometric methodology and social network analysis are used. Population comprises 2146 recor...

متن کامل

Recovering Individual Accessing Behaviour from Web Logs

In this paper, we present a new view on the data preparation in web usage mining. We concentrate on recovering individual usage behaviour from accessing records on web site. We defined five categories of individual behaviours such as granular accessing behaviour, linear sequential behaviour, tree structure behaviour, acyclic routing behaviour and cyclic routing behaviour. The algorithms for rec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006